Swin Transformer: Hierarchical Vision Transformer using Shifted Windows | home

Key Ideas

Overall Architecture

The architecture of Swin Transformer Detailed architecture specifications

Main Characteristics

Swin Transformer characteristics vs ViT

Well-designed Details

Shifted windows approach

\begin{align} &\hat{z}^l=W\mbox{-}MSA(LN(z^{l-1}))+z^{l-1} \\ &z^l=MLP(LN(\hat{z}^l))+\hat{z}^l \\ &\hat{z}^{l+1}=SW\mbox{-}MSA(LN(z^l))+z^l \\ &z^{l+1}=MLP(LN(\hat{z}^{l+1}))+\hat{z}^{l+1} \\ \end{align}

Cycylic shift for efficient batch computation